tg-me.com/Python4all_pro/1585
Last Update:
🖥 PDF CRAFT-a python library for converting PDF (primarily scanned books) in Markdown and EPUB using local AI models and LLM to structure the contents
Github
Basic possibilities
- extracting text and layout
Uses the combination of Doclayout-Yolo and its own algorithms for detecting and filtering headlines, columns, footnotes and page numbers
- Local OCR
Recognizes the text on the page via Onnxocr, supports acceleration on GPU (CUDA)
- Determining the order of reading
With the help of LayoutReader, it builds a flow of text in the order in which it is perceived by a person
- Converting in Markdown
Generates .MD with relative links to images (illustrations, tables, formulas) in the Assets folder
Installation and requirements
Python ≥ 3.10 (recommended 3.10.16).
Pip Install PDF-Craft and PIP Install Onnxruntime == 1.21.0 (or Onnxruntime-GPU == 1.21.0 for CUDA).
For an EPUB conveier, you need access to the LLM service (for example, Deepseek).
🟡 Github
#پایتون #Python #library
🆔 @Python4all_pro
BY پایتون ( Machine Learning | Data Science )
Warning: Undefined variable $i in /var/www/tg-me/post.php on line 283
Share with your friend now:
tg-me.com/Python4all_pro/1585